Combining TLS and MLS: An experiment

We did a thing. We combined TLS and MLS into a hybrid protocol. Of course, when things get serious, full names are in order: We combined the Transport Layer Security protocol and the Messaging Layer Security protocol. This blog post is about the why and the how.
Why we did this
When TLS 1.3 was standardized in August 2018, it represented a big step up from TLS 1.2 in terms of security, performance, and overall protocol design. During the standardization process, TLS 1.3 was scrutinized and proven secure by various security research teams. It is a great overall protocol for encrypted communication on the web and the many applications that run on it. However, even a protocol as great as TLS 1.3 doesn’t fit every use case for encrypted communication.
Long-running connections
Some applications require connections to persist over longer periods, such as days, weeks, or even months. In those scenarios, state compromise during the lifetime of a connection is a relevant part of the threat model.
The main mitigation for state compromise in the context of security protocols is typically to update key material, which can be done in two different ways. TLS 1.3 supports simple key updates, where a new key is derived from the current encryption key. This provides forward secrecy, which means that the compromise of the new key doesn’t allow decryption of messages encrypted under the old key.
Another way to update key material requires deriving a new key from the current one, while at the same time injecting fresh key material. This is more powerful because it allows the updating party to recover from state compromise, meaning that the adversary gets locked out of the connection (assuming it’s not actively interfering during the update process). However, this requires a little more complexity because the newly injected key material needs to be sent to the other party.
TLS 1.3 supports recovery from compromise by way of performing a session resumption with a full handshake. However, this procedure requires interrupting the connection. For applications that rely on long-running, uninterrupted connections, TLS 1.3 is therefore not ideal.
There are two other reasons why TLS 1.3 could be less than ideal for some deployment scenarios.
Post-quantum security
With continuous advances in quantum computing technology, the harvest-now-decrypt-later adversary is increasingly part of threat models for many individuals, companies, and institutions. As the NIST competition to standardize a post-quantum secure KEM was not even close to finished by the time TLS 1.3 was standardized, it doesn’t by default include any post-quantum secure ciphersuites. While the TLS working group at the IETF is almost finished developing a post-quantum secure variant, neither protocol nor implementations are quite there yet.
X.509 certificates
While the specification supports other credential types in theory, TLS 1.3 implementations exclusively support X.509 certificates. This makes sense for backwards compatibility and use in the context of the Web-PKI. However, X.509 as a format is more complex than required in most non-web usage scenarios, where either raw public keys or a simpler certificate scheme would suffice.
An experiment
All of the problems previously mentioned are specific to the key agreement component of TLS 1.3, so we decided to run an experiment: fusing the record protection layer of TLS 1.3 to the key agreement mechanism of Messaging Layer Security protocol.
A quick introduction to MLS
The Messaging Layer Security (MLS) protocol, standardized as RFC9420 in 2023, is a group messaging protocol at the core of which lies a Continuous Group Key Agreement (CGKA) protocol. MLS’ CGKA component allows two or more parties to agree on a shared secret, which can be updated continuously after the initial key agreement.
MLS is a modern messaging protocol and in many ways, is inspired by TLS 1.3. However, it is more flexible in many ways and not hampered by requirements for backwards compatibility or the need to support existing infrastructure.
MLS is crypto-agile in the same way as TLS 1.3, except that its ciphersuites are based on abstract key encapsulation mechanisms (KEMs) and can thus be used out-of-the-box with designs such as ML-KEM or hybrid constructions.
Through its CGKA component, MLS allows two or more parties (or group members in the MLS jargon) to agree on a shared secret. As a secure messaging protocol, it is built with long-running asynchronous connections in mind, and as implied by the name, key material can be updated continuously. This allows group members to update their key material to achieve both fine-grained forward secrecy (FS) and post-compromise security (PCS).
Authentication in MLS is based on raw signature public keys. As such, MLS is agnostic to what kind of credential is used under the hood or whether there is a meaningful identity bound to a given key. As a consequence, MLS supports any type of credential (X.509 or otherwise) and both unilateral as well as mutual authentication.
Like TLS 1.3, MLS has undergone a rigorous standardization process and has been thoroughly analyzed by security researchers from a number of universities and research labs.
MLS-TLS Combination
As both TLS 1.3 and MLS are modern security protocols with a modular architecture, combining them was a simple matter of swapping out the key agreement component of TLS 1.3 with the CGKA component of MLS.
Combining components of two secure protocols doesn’t necessarily yield a secure protocol. However, key agreement and message- or stream encryption are typically analyzed separately, followed by an argument about the composability of the two. As such, we are confident that our use of both components on the one hand doesn’t impede their individual security and on the other yields a secure protocol in its own right.
How it works
As is common practice for the design of secure channel protocols, our approach composes a key agreement mechanism with a message encryption mechanism. In this case, both parties use MLS to establish a shared key and the AEAD-based TLS record layer for message encryption.
Multiplexing application and handshake messages
Because MLS is a CGKA, it doesn’t stop after the initial key has been agreed upon. Instead, fresh key material is injected periodically to achieve post-compromise security. This mechanism can be triggered by either the time elapsed or by the amount of data sent through the channel.
Alongside the MLS updates, the protocol also makes use of the TLS-specific key updates for forward secrecy. Since the latter are cheaper both computationally and bandwidth-wise, they can be applied at a higher frequency. Both the PCS and the FS updates are transmitted through the secure channel alongside the application payloads.
Pluggable transport layer
MLS-TLS requires the underlying transport layer to provide reliable transmission of messages, but is otherwise independent of the specific protocol. Thus, much like DTLS was developed as a UDP variant of TLS, MLS-TLS can be adapted to any underlying transport layer protocol.
Implicit delivery service
In group messaging scenarios, MLS requires a delivery service (DS), which ensures that all group members agree on the sequence of messages. However, in a two-party scenario, an additional party is required to perform this role. While messages can still cross on the wire, the fixed roles of the participants help establish the message ordering.
Where we tested it
To ascertain that the protocol works as intended in a real-world setting, we built a custom VPN application. The application’s inner workings are similar to other VPN apps, where traffic is routed from a client to a server and encrypted along the way. This allowed us to monitor the throughput at the application level.
The result was that the bandwidth and latency were not meaningfully impacted, and we could use typical applications in a real-world deployment. While this particular deployment was implemented as a road-warrior-type VPN, nothing would prevent deployments of full-tunnel remote-access or site-to-site VPNs.
We have open-sourced the core of the protocol. Our implementation is based on OpenMLS (our MLS implementation written in Rust).
Outlook
This project was primarily carried out to study the feasibility of this approach. It will likely serve as a starting point to explore further in the following areas:
Adapting to other protocols
Using the new protocol to replace TLS was a natural choice for a first step. Running a transport encryption protocol over TCP can, however, have its limitations. Specifically in the context of VPN applications, a more natural choice might be to run it over UDP and therefore replace DTLS. We estimate that this is just as feasible, but requires more work to handle the potential retransmission of the handshake.
In the same vein, QUIC (as a combination of UDP-like flexibility and TCP-like robustness) will be an interesting protocol to investigate.
Upgrading the two-party protocol to an n-party protocol
As mentioned in the beginning, we willfully downgraded MLS’ CGKA to a CKA protocol by limiting the number of members in an MLS group to two. We can just as easily reverse this arbitrary limitation and add more members to MLS groups.
Giving a third party access to a two-party protocol has a shady history and can be potentially very dangerous when carried out without the knowledge of one or even both parties. Luckily, MLS has a built-in safeguard for exactly this threat model: cryptographic agreement of all members of the group about who the other members are.
We can imagine scenarios where upgrading a two-party protocol to an n-party protocol could be useful. For example, when the ends of a two-party connection are only logical ends, but in reality consist of multiple clients.
Supporting more efficient post-quantum key rotations for better Post-Compromise Security (PCS)
Two major advantages of this protocol are a) the fact that it supports injecting fresh key material during the protocol and b) post-quantum resistant ciphersuites thanks to its crypto agility. The former is the mechanism that gives us Post-Compromise Security. Ideally, these key rotations happen as often as possible to keep the window of compromise as small as possible.
While this logic is true for the key rotations with classical crypto primitives, it does not apply to the post-quantum part of the negotiation. The reason for that is that we assume it is unlikely that an attacker has a cryptographically relevant quantum computer and can frequently compromise endpoints today. With that in mind, we can use the Flexible Hybrid PQ MLS Combiner to separate both parts of the negotiation and do the more expensive post-quantum resistant negotiations at a slower pace.
We'd love to hear from you! Whether you're interested in experimenting with MLS-TLS, have questions about the implementation, or see potential applications we haven't considered – reach out at hello@phnx.im.